Regional uniqueness of tree species composition and response to forest loss and climate change

The conservation and restoration of forest ecosystems require detailed knowledge of the native plant compositions. Here, we map global forest tree composition and assess the impacts of historical forest cover loss and climate change on trees. The global occupancy of 10,590 tree species reveals complex taxonomic and phylogenetic gradients determining a local signature of tree lineage assembly. Species occupancy analyses indicate that historical forest loss has significantly restricted the potential suitable range of tree species in all forest biomes. Nevertheless, tropical moist and boreal forest biomes display the lowest level of range restriction and harbor extremely large ranged tree species, albeit with a stark contrast in richness and composition. Climate change simulations indicate that forest biomes are projected to differ in their response to climate change, with the highest predicted species loss in tropical dry and Mediterranean ecoregions. Our findings highlight the need for preserving the remaining large forest biomes while regenerating degraded forests in a way that provides resilience against climate change.

Figure 1 is confusing.Why not show the actual result of the ordinations, rather than colouring points based on position in 3 dimensional space.Environmental vectors can be mapped onto the ordinations.I think it is important that the reader sees these ordinations in the main text to know how good they are.
Lines 209-210 -small ranges in tropics don't necessarily reflect ecological specialism, but could just be product of non-equilibrium dynamics (sensu Hubbell or Ricklefs) Give species authority when first mentioning it.
Results section actually includes lots of discussion of results (e.g.lines [209][210][211][212][213][214][254][255], but not referencing the literature (e.g., lines 172-174... this is not a novel result... see work by Qian and Ricklefs and others).Discussion either needs to be moved to discussion or appropriate references should be placed in that section.
Why use 2000 as forest cover year?Lots of deforestation since then and data are out there for more recent years.
Lines 229-230 -I would advocate using the IUCN ecosystem typology.Those ecosystems (and biomes) are functional and probably make more sense to use.
The elevational range shifts in the results need more explanation.
Lines 265-266 -Given the quality of globally available soils data, it is not surprising that edaphic factors do not come out as that important.This should be acknowledged.
Line 338 -Although their figure 2d suggests that the majority of species at tropical latitudes have very small ranges Line 415 -why these particular soil variables?Please justify.
Line 459 -what was the stress value of this NMDS?It seems unlikely that it would be below an acceptable threshold for a community composition dataset spanning the globe.
Line 464 -how much variation explained by these three axes?
Reviewer #2 (Remarks to the Author): In this manuscript, entitled " Regional uniqueness of tree species composition and response to forest loss and climate change", the authors investigated the species composition using 10590 tree species globally, and tree species range dynamics in both historical and future contexts.While the subject matter is captivating, there are some major concerns about the methodology and overall logic presented in the current version.
The manuscript should include detailed step-by-step instructions for the main analyses conducted, making it easier for readers to follow and repeat the process.The current version of the manuscript lacks comprehensive details in several areas, such as the data cleaning process.Specifically, the authors mentioned removing duplicates from occurrences collected from 13 databases, but this section requires further elaboration.As the accuracy and completeness of the compiled data are fundamental to the study, a more focused and in-depth explanation of the data cleaning process is essential.In addition, to strengthen the validity of the compiled data, it would be ideal to provide additional information about the validation process.While the authors briefly mentioned validation using sPlot data in Line 436, this section should be expanded to instill confidence in the dataset used.Some specific comments to the method are listed below.
The link between the main results concerning the uniqueness of species composition and species range changes seems rather weak now.While phylogenetic diversity was utilized in the first section, its role in subsequent analyses should be clarified.Exploring how species composition might change in the future would be an interesting addition, potentially offering deeper insights into the implications of climate change on tree species distributions.

Specific comments:
Line 53-57, did the authors indicate those factors drive all the regional turnover globally?
Probably not.
Line 64, it is not clear why phylogenetic composition is important here.Probably needs some introductory content previously.
Line 73-74, again, see the two papers for Lines 60-63.And a resolution of 100 km used here could be a bit coarse for some experts.
Line 85, could be helpful to specify what "climate-smart" refers to.
Lines 57-91, I agree that these are important.However, after reading the manuscript, it seems this study did not overcome those limitations, or the algorithms used did not seem novel to many SDM users.
Lines 170-172, maybe I missed something, but I did not see this pattern from Fig, S4.Line 265, were any results here related to this statement (historical factors)?Lines 272-285, it is true these techniques are important to macro-scale ecology studies; however, I do not see this paragraph is relevant here, particularly as the second paragraph of the discussion.
Lines 383 and on, why this 1000 km was selected?It is not a short distance and over the size of many European countries, thus, could cover nonnative countries.
Lines 427-430, it is a bit confused and how to be sure which variables are important to specific species?

Reviewer #3 (Remarks to the Author):
The manuscript is interesting and have great potential, I like the idea and I also mostly like the multiple methodology used to develop the research, however there are some issues that need attention.L28-29 "Nevertheless, tropical moist and boreal forest biomes still harbor extremely large29 ranged tree species."Well this is not necessarily entirely true because these are the two most extreme forest biomes, the former well known to be stable since a long period of time, the latter composed by few species that have large distributional ranges.I think this need careful interpretation.L30 "to differ IN" I think in should not be capitalised.
L96 "for which we had sufficient occurrence data" how was this assessed?L113-114 "combining geographic range polygons based on reported native countries and species distribution modelling" I have some concern about the use of distribution at country level because each country have very different size and thus this cannot really be considered a measurement unit.The authors explained this in the method section but this should be better highlighted here as well.
L115 "at a 30-arc second resolution" this is my main concern, I strongly suggest to change the projection of the study into an equal-area projection because this is the way how global spatial biodiversity assessments should be made, species-presence-absence and associated processes are scale and spatial dependent.L117-118 "with at least 20 spatially explicit observations and available reported native ranges from GlobalTreeSearch" why 20 observations is considered an adequate number?
And how much should they spatially separated?L119, I think the authors should clearly separate the choice of selection the species to be modelled: from one side based on biological basis (e.g.sufficient number of observation representative of the natural range) and on technical basis (e.g., sufficient number of observation to run reliable models).
L144-146 I would consider the fire season length instead of too narrow environmental variables like soil properties.We would like to thank you for the opportunity to submit a revised version of our manuscript "Regional uniqueness of tree species composition and response to forest loss and climate change" for publication in Nature Communications.We thank you for your time reviewing our manuscript and your constructive comments that allowed us to greatly improve our study.We provide a point-by-point response to each of the reviewer's comments below.All modifications to the manuscript, including slight changes in the wording of certain sentences, have been marked in blue.Finally, we added data and code availability statements to our manuscript, including a link to the code necessary to replicate our study.

Reviewer #1 (Remarks to the Author):
I found the paper well-written and intriguing.The extent of analysis is huge, and a Nat Comms paper is a short space to give such complexity appropriate attention.Still, the authors do manage to pull out meaningful results.I am generally not a fan of very 'mega' papers like this, but the authors do seem to have been pretty careful in their choice of methods and acknowledge limitations in the main text.Thus, overall, I actually only have fairly minor suggestions, detailed below.Response: We would like to thank the reviewer for the positive evaluation of our work and for the pertinent comments.
How is their tree species mapping different from previous efforts?I would assume they think theirs is better, but this needs to be explained.For example, see: Serra-Diaz, J. M., Enquist, B. J., Maitner, B., Merow, C. & Svenning, J. C. Big data of tree species distributions: how big and how good?For.Ecosyst. 4, 0-12 (2017).Response: The main technical innovation provided for our mapping pipeline was the implementation of a species distribution modeling algorithm in a cloud-computing platform.Additionally, our study benefits from the large amounts of publicly available occurrence data and our efforts to compile multiple databases.The size of our dataset is on par with that of the Serra-Diaz et al. paper.However, while the paper by Serra-Diaz et al. presents an impressive analysis of the geographical coverage of occurrence data globally, it does not include any species distribution modeling as in our study.Here, the combination of large datasets and our cloud-based environmental niche algorithm allowed us to generate continental-wide maps at a higher resolution than previously done, of 30 arc seconds, for a large number of species and with high precision thanks to the high number of occurrence records.We modified lines 113-114 to clarify these two aspects.Furthermore, as underlined in the manuscript in lines 145-147, our approach combines species distribution modeling and geographic range limits, which was not the case with previous large-scale modeling efforts.
One of the main results emphasised is the 'high uniqueness' of composition in any given place on the globe, but this is not backed up by any analysis.This statement needs to be justified somehow.Response: Thank you for this very relevant comment.We agree that our paper lacked some analyses to support the result of near-unique tree species composition across the Earth.We performed some additional analyses which are illustrated in figures in the supplementary material and are included in the results section.
Figure S5 shows the distributions of the ordination axes for both the taxonomic and phylogenetic ordinations in one-and two-dimensions.Figure S6 illustrates our revisited clustering analysis of the two ordinations: a plot of the silhouette coefficient that measures cluster quality against the number of clusters, and maps of the clustered taxonomic and phylogenetic ordinations using the number of clusters that maximized the score.The analysis is explained in the methods section in lines 507-513.The lack of structure in the distributions combined with the lack of well-defined, fine-grained clusters indicates the uniqueness of tree species composition across the globe.We note that the taxonomic ordination showed smooth gradients with the best clustering generating only two clusters, which is not particularly informative as they retain a high level of intra-cluster variance.The phylogenetic ordination contains more structure -the most striking example of this is a peak found in the phylogenetic ordination around 0.25, 0.0, and -0.35 on each axis respectively (Figure S5f).This corresponds to an area of very homogeneous phylogenetic composition spanning Northern Canada and Alaska (Figure 1d) and one of the five clusters obtained for this ordination.We included these results in lines 176-183.These elements were also included in the discussion in lines 316-318.
The authors could have validated their modelling for regions where tree species have their ranges mapped.That would be a good proof of concept.Response: We agree that this would provide a good validation.Hence, we have included a comparison of our modeled distributions to the maximum habitat suitability (MHS) maps from the "Tree species distribution data and maps for Europe" report from the European Commission (https://data.europa.eu/doi/10.2760/489485) in Figure S4, in the results section in lines 141-144, and the methods section in lines 488-491.We computed the intersection over union (IoU) of the areas covered by both maps for 23 species for which the data was available.While there are some considerable differences between both maps, we note that vast areas of agreement are found and that the discrepancies are generally due to our models being more conservative.
The authors use 10% tree cover as their threshold for forest.What about trees in savannas?Or the work showing the large number of trees occurring in drylands (e.g., Reiner et al. 2023Nat Comms, Brandt et al. 2020.Nature) Response: We agree that constraining species range to areas with at least 10% tree cover excludes parts of species' ranges that may occur outside of forests.This may lead to underestimated realized ranges of some tree species.However, we still consider that the range reduction is indicative of the extent of historical forest loss.We agree it is important to acknowledge this limitation and have added it in the discussion in lines 331-333.
Can they give us more info on this "which limits the number of false negatives while tolerating some false positives"?That is rather vague.What is 'some false positives'?Can this be estimated?Response: We agree that this was a rather vague formulation.We reformulated this sentence to indicate that the most frequent type of errors made by our models were false positives (predictions of suitable habitat where the species has not been recorded) and included the false positive and false negative rates in lines 135-137.Also, is this mis-phrased: "false positives are not necessarily problematic when comparing presence-absence data to habitat suitability maps, as the absence of a species does not mean the habitat is not suitable" I would have expected that sentence to be about false negatives.Response: We agree that this sentence was not very clear.However, it is intended to be about false positives, which are locations that the model has classified as suitable although no occurrence was recorded there (ie.falsely classified as positive/suitable).As we use presence-only data, the species may be present but not recorded.Alternatively, the habitat may be suitable with regard to the variables taken into account in the model even if the species is not found there due to other factors, such as biotic interactions, human influence, or stochasticity.We rephrased this sentence to clarify it in lines 137-140.
Figure 1 is confusing.Why not show the actual result of the ordinations, rather than colouring points based on position in 3 dimensional space.Environmental vectors can be mapped onto the ordinations.I think it is important that the reader sees these ordinations in the main text to know how good they are.Response: We agree that our visualization of the ordinations in environmental and geographic space may be unconventional.We also appreciate and agree with the importance of visualizing the ordinations in the ordination space.Therefore, we added the visualizations in ordination space with 2-dimensional plots in the supplementary material in Figure S4, with the same coloring scheme as used in Figure 1.This figure is referenced in the main results in lines 168 and 170.Nonetheless, we prefer to keep the ordinations mapped to colors in a 3-dimensional space, as this allows us to create maps showing more than one ordination axis.Moreover, as these ordinations are in taxonomic and phylogenetic space, respectively, we cannot map the environmental vectors in the ordination space in Figure S4.Environmental vectors are illustrated in Figure 1 a,c, where we see that the same environmental conditions can be associated with different ordination values.
wide maps at a high resolution of 30 arc seconds, which, in turn, allowed us to study entire species ranges and range shifts along latitude and elevation with sufficient precision.We, therefore, consider that highlighting the technical aspects and the datasets that allowed us to carry out this study is relevant to the discussion.Nonetheless, we shortened it slightly as we appreciate that these elements may not be of the main interest to some readers.
Lines 383 and on, why this 1000 km was selected?It is not a short distance and over the size of many European countries, thus, could cover nonnative countries.Response: We agree that 1,000 km is a large buffer size.On one hand, it was selected to compensate for potential gaps in the native country dataset, as the lack of data in many regions due to sampling biases may lead to incomplete native-country datasets.On the other hand, as we wanted to use the same range polygon when predicting distributions with current and future climatic variables, we decided to add such a large buffer so that the range polygon would be large enough to allow the species distribution to shift under climate change.We acknowledge that this comes with the limitation that the range will therefore cover nonnative countries.We added explanations in the methods section in lines 409-411.
Lines 427-430, it is a bit confused and how to be sure which variables are important to specific species?Response: We agree that this section was confusing.For species with fewer than 90 observations, we trained a random forest model with all predictors to obtain the variable importance for each predictor.Then, the top predictors were selected such that there were at least 10 observations per predictor (eg.two predictors were selected for species with 20-29 occurrence records) and the final model was trained with those predictors.We clarified this section in lines 456-461.

Reviewer #3 (Remarks to the Author):
The manuscript is interesting and have great potential, I like the idea and I also mostly like the multiple methodology used to develop the research, however there are some issues that need attention.Response: We would like to thank the reviewer for the encouraging evaluation of our work.We hope we have addressed all issues in the revised version and our answers below.
L28-29 "Nevertheless, tropical moist and boreal forest biomes still harbor extremely large29 ranged tree species."Well this is not necessarily entirely true because these are the two most extreme forest biomes, the former well known to be stable since a long period of time, the latter composed by few species that have large distributional ranges.I think this need careful interpretation.Response: We modified the sentence to acknowledge that the tropical moist and boreal forest biomes have significant differences in richness and composition, while still conveying our results that these biomes have the lowest levels of range restriction across all other forest biomes and host large-ranged tree species.It can be found in lines 28-30.Furthermore, we would like to note that the results section contains a more extensive explanation of the differences between these two biomes in lines 216-224.L30 "to differ IN" I think in should not be capitalised.Response: Thank you for pointing this out.It was corrected in line 31.
L96 "for which we had sufficient occurrence data" how was this assessed?Response: We agree that this is rather vague.We selected species with more than 90 occurrence records as this was found to train models with adequate predictive performance.This was clarified in lines 102-103.We also clarified this in the results section in line 128.L113-114 "combining geographic range polygons based on reported native countries and species distribution modelling" I have some concern about the use of distribution at country level because each country have very different size and thus this cannot really be considered a measurement unit.The authors explained this in the method section but this should be better highlighted here as well.
Response: We agree that country-level native ranges are not ideal due to the variation in country sizes.However, to our knowledge, these are the most detailed global data that are currently available.Nonetheless, the ranges also depend on the location of occurrences within the reported native range and they represent only a coarse estimate of species' native ranges.We added these details in lines 122-125.
L115 "at a 30-arc second resolution" this is my main concern, I strongly suggest to change the projection of the study into an equal-area projection because this is the way how global spatial biodiversity assessments should be made, species-presenceabsence and associated processes are scale and spatial dependent.Response: We agree that scale-dependency is an important issue when conducting spatial biodiversity analyses, but we consider it less pressing when modeling habitat suitability of individual species.In this case, scale dependency would boil down to the question of whether the environmental predictors systematically differ with area covered by a pixel.The Arctic tree line extends to about 70 °N, and therefore the area covered by pixels of our 30arc grids differ by a factor of three, roughly between 342,000 m 2 and 1 km 2 .Across these scales, we do not see an issue for the environmental covariates used here.Reprojecting these layers into equal area projections, on the other hand, would introduce errors into the model covariates which would propagate to the results.Nevertheless, although our modeling pipeline uses a 30-arc second resolution, the actual area of each pixel was taken into account for all relevant downstream analyses (range sizes in Figure 2 and latitude and elevation shifts in Figure 3).L117-118 "with at least 20 spatially explicit observations and available reported native ranges from GlobalTreeSearch" why 20 observations is considered an adequate number?Response: While the choice of the number of minimum observations to construct a model remains to some extent arbitrary, other studies have also used 20 as a minimum.We added a reference to one of these in line 396-397 (Serra-Diaz et al. 2018).More importantly, this is only taken as an initial minimum number, which we then increase to 90 following the results in Figure S1, which is the relevant number for the downstream analyses.We clarified this in the methods in lines 480-482.
And how much should they spatially separated?Response: The observations are aggregated at 30-arc seconds, corresponding to the resolution of the model covariates.This means they are spatially separated by 30-arc seconds which corresponds to about 1 km at the equator.We clarified this in line 122.L119, I think the authors should clearly separate the choice of selection the species to be modelled: from one side based on biological basis (e.g.sufficient number of observation representative of the natural range) and on technical basis (e.g., sufficient number of observation to run reliable models).Response: Thank you for this comment.The criteria for the selection of species was made on a technical basis as we selected the number of observations with which the model performance is expected to be sufficient.We rephrased this in line 128 to clarify this.L144-146 I would consider the fire season length instead of too narrow environmental variables like soil properties.Response: We agree that variables that characterize fire season length or fire frequency and severity are likely to be important.However, as we used the same model covariates for all species, we selected variables of general ecological relevance across all biomes.Furthermore, the soil variables that we included can capture critical insights about the nutrient limitations that influence species ranges.We added a sentence in the discussion to address the limitations associated with the choice of model covariates in lines 295-297.
Fig 1, I am really surprised that both taxonomic and phylogenetic ordinations are so similar, is this related to the methodology used (e.g.evopcahellinger)?Response: The two ordinations look quite similar as we chose the mapping of the ordination axes to red, green, and blue such that the colors would match as best as possible between both ordinations.However, the methodologies used for both ordinations are quite different, as explained below.For the taxonomic ordination, we used NMDS which uses rank order of the dissimilarity between sites.Here, we used the Sorenson index (which is equivalent to the Bray-Curtis index for presence/absence data) to quantify the turnover in species composition between sites.On the other hand, for the phylogenetic ordination, we used an evolutionary PCA based on Hellinger distance, where the branch lengths of a phylogenetic tree, or evolutionary units, are used as the basic entities instead of species.This inclusion of evolutionary units in this approach makes it distinct from the approach used for the taxonomic ordination.
Fig2-Fig3 how have the biomes been identified?Which classification have been used?Response: We used the RESOLVE biome classification.The appropriate reference (Dinerstein et al. 2017) was added in lines 242, 529 and 539.

Fig 1 ,
Fig 1, I am really surprised that both taxonomic and phylogenetic ordinations are so similar, is this related to the methodology used (e.g.evopcahellinger)?